Document Summarizer for Word Processors

ABSTRACT

A document summarizer for word processors is described. In one aspect, a document is accessed for summarization. Using a phase summarizing process, a sentence-based summary of writings of the document is constructed from the writings. A file associated with the document is located. The sentence-based summary is inserted into the file such that the sentence-based summary is before an opening paragraph of the document. The file is saved to non-volatile memory.

RELATED APPLICATIONS

This application is a continuation of co-pending U.S. patent applicationSer. No. 10/074,951, titled “Document Summarizer for Word Processors”,filed on Feb. 11, 2002, assigned hereto, and hereby incorporated byreference. Co-pending U.S. patent application Ser. No. 10/074,951 is acontinuation of U.S. patent application Ser. No. 09/289,085 (now U.S.Pat. No. No. 6,349,316), filed on Apr. 8, 1999, titled “DocumentSummarizer for Word Processors”, which is hereby incorporated byreference. U.S. Pat. No. 6,349,316 was a continuation of U.S. patentapplication Ser. No. 08/622,864 (now U.S. Pat. No. 5,924,108), filed onMarch 29. 1996, which is hereby incorporated by reference.

BACKGROUND

Many people are faced with the daunting task of reading large amounts ofelectronic textual materials. In the computer age, people are inundatedwith papers, memos, e-mail messages, reports, web pages, schedules,reference materials, test results, and so on. Unfortunately, manydocuments do not begin with summaries. Creation of summaries is tedious,requiring the author to re-read the document, identify major themes, anddistill the main points of the document into a concise summary. Mostauthors never bother.

Summarizing a document is even more difficult and time-consuming for areader. The reader must first read the entire document (or at least skimit) to understand the contents. The reader must then attempt to extractthe document's key points from unimportant details.

The problems associated with handling large volumes of un-summarizeddocuments are particularly acute for MIS (Management InformationSystems) personnel. These individuals are confronted daily with tasks oforganizing, managing, and retrieving documents from large databases.Imagine this typical scenario. An MIS staff member receives a crypticrequest to locate all documents that pertain to a topic believed to havebeen discussed in a several company memos written about three to fouryears ago. To accommodate this search request, the MIS staff member mustfirst perform a word search for the topic, and then laboriously peruseeach hit document in an effort to find the mysterious memos. Withoutsummaries, the staff member is forced to read large portions, if notall, of each document before concluding whether the document is relevantor irrelevant. Being forced to read unnecessary text leads to manywasted hours of the staff member's time.

The problem is less critical, but still troubling, for individual userswho are browsing through the Internet or other networks to finddocuments on a related topic. Upon locating a document, the user musteither read the document online to determine whether it is relevant (atthe cost of additional online expenses), or download the document forlater review (at the risk of retrieving an irrelevant document).

To help address these problems, computer-implemented documentsummarizers have been developed to automatically summarize text-baseddocuments for the readers. The document summarizers examine an existingdocument, and attempt to create an abstract or summary from the existingtext.

Early development on document summarizers centered on statisticalapproaches to creating summaries. One statistical approach is describedin an article by H. P. Luhn, entitled “The Automatic Creation ofLiterature Abstracts,” which was published April 1958 in the IBM Journalat pages 159-165. The Luhn technique assigns to each sentence a“significance” factor derived from an analysis of its words. This factoris computed by ascertaining a cluster of words within a sentence,counting the number of significant words contained in the cluster, anddividing the square of this number by the total number of words in thecluster. The sentences are then ranked according to their significancefactor, with one or several of the highest ranking sentences beingselected to form the abstract.

Most, if not all, of the document summarizers in use today appear toemploy the Luhn technique. Examples of such summarizers include a TextSummariser from BT (formerly British Telecom), Visual Recall from XsoftCorporation (a subsidiary of Xerox), and InText from Island Software.

Another approach to summarizing documents is described in an article byKenji Ono, et al., entitled “Abstract Generation Based on RhetoricalStructure Extraction,” which was published in Proceedings of the 15^(th)International Conference on Computational Linguistics, Vol. 1, at pages344-348, for a conference held Aug 5-9, 1994 in Kyoto, Japan. Theirapproach involved a linguistic analysis, which constructed rhetoricalstructures representing relations between various chunks of sentences inthe body of the section. The rhetorical structure is represented by twolevels: intra-paragraph, which analyzes the text according to sentenceunits, and inter-paragraph, which analyzes the text using paragraphunits. Extraction of the rhetorical structure is accomplished using adetailed and sophisticated five-step procedure. The Ono technique isunnecessarily complicated for many situations where a rudimentarysummary is all that is desired.

In addition, this technique is highly genre-dependent, producing goodsummaries only when the text is rich in superficial markers of itsdiscourse structure. It thus works relatively well on the academic proseexamined by Ono et al., but will fail on documents written in lessformal prose.

When the summaries are created, conventional document summarizerspresent the results to the reader in one of two formats. The firstformat is to underline or otherwise highlight the sentences that aredeemed to be part of the summary. The second format is to show only theabstracted sentences in paragraph or bullet format, without theaccompanying text of the document.

One common problem with the conventional document summarizers is thatthey are reader-based. These summarizers do not consider summarycreation and presentation from the perspective of the author.

Accordingly, there remains a need to provide an author-orientedsummarizer for a word processor that helps authors automatically createsummaries for their writings, and one which will produce a summary forany text which is presented to it.

SUMMARY

A document summarizer for word processors is described. In one aspect, adocument is accessed for summarization. Using a phase summarizingprocess, a sentence-based summary of writings of the document isconstructed from the writings. A file associated with the document islocated. The sentence-based summary is inserted into the file such thatthe sentence-based summary is before an opening paragraph of thedocument. The file is saved to non-volatile memory

This Summary is provided to introduce a selection of concepts in asimplified form that are further described below in the detaileddescription. This Summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagrammatic illustration of a computer loaded with a wordprocessing program having a document summarizer, according to oneembodiment.

FIG. 2 is a flow diagram of steps in a computer-implemented method forsummarizing documents, according to one embodiment.

FIGS. 3 a and 3 b show documents with summaries inserted therein toillustrate two different display presentations of a summary, accordingto one embodiment.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 1 shows a computer 20 having a central processing unit (CPU) 22, amonitor or display 24, a keyboard 26, and a mouse 28. Other inputdevices—such as a track ball, joystick, and the like—may be substitutedfor or used in conjunction with the keyboard and mouse. The CPU 22 is ofstandard construction, including memory (disk, RAM, graphics) and aprocessor.

The computer 20 runs an operating system which supports multipleapplications. The operating system is stored in memory in the CPU 22 andexecutes on the processor. The operating system is preferably amultitasking operating system which allows simultaneous execution ofmultiple applications. One example operating system is a Windows® brandoperating system sold by Microsoft Corporation, such as Windows® 95 orWindows NT™ or other derivative versions of Windows®. However, otheroperating systems may be employed, such as Mac™OS operating systemsemployed in Macintosh computers manufactured by Apple Computer, Inc.

This invention concerns a document summarizer that can be implemented ina word processing system. In the illustrated system, the word processingsystem is implemented as a software application which is stored in theCPU memory or other loadable storage medium and runs on the operatingsystem of computer 20. One example word processing application isMicrosoft® Word from Microsoft Corporation, which is modified with thedocument summarizer described herein.

It is noted that the word processing system might be implemented inother ways. For instance, the word processing system might comprise adedicated typewriter machine with limited memory and processingcapabilities (in comparison to a personal computer) that is used almostexclusively for word processing tasks. It is further noted that thedocument summarizer described herein can be implemented in otherprograms, such as an Internet Web browser (e.g., Internet Explorer fromMicrosoft Corporation), an e-mail program (e.g., WordMail and Exchangefrom Microsoft Corporation), and the like. However, for discussionpurposes, the document summarizer is described in the context of acomputer word processing program, such as Microsoft® Word.

When an author wishes to summarize a document, the author initiates thedocument summarizer function on the word processing program. As usedherein, the term “document” means any image that contains text in aformat intended for a viewer or other computer program which will thenpresent the text as intelligible language. Examples of documents includeconventional word processing documents, e-mail messages, memoranda, webpages, and the like. The document summarizer is activated through a pulldown menu or soft button on the graphical user interface windowpresented by the word processor. Upon activation, the documentsummarizer begins processing a document to produce a summary.

FIG. 2 shows the general steps in a computer-implemented method forsummarizing a document that are carried out by the computer. The methodis described with additional reference to an example document containinga four-sentence paragraph, which is summarized into a two-sentencesummary. The paragraph is given as follows:

-   -   The Internet is a great place to shop for a computer.        Manufacturers have web sites describing their computers. One        computer manufacturer offers a money back guarantee. That is why        that manufacturer has so many visits to its Internet web site.

In general, the document summarizing process involves three phases: astatistical phase, a cue-phrase phase, and a presentation phase. Thestatistical and cue-phrase phases are preferably conducted concurrentlyduring a single pass through a document. However, they can be performedsequentially as well, in any order. In the statistical phase, thedocument summarizer begins by reading each word and counting howfrequently content words appear in a document (step 40 in FIG. 2).“Content words” are those words which provide non-grammatical 8 meaningto a text. Nouns are good examples of content words. In the aboveparagraph, content words include “Internet,” “manufacturer,” “computer,”and so forth.

Within the context of the summarizer, content words can be technicallydefined as words that are not “stop words.” In this context, the set ofstop words includes both grammatical function words (e.g. conjunctions,articles, prepositions) and certain high frequency verbs and nouns (e.g.“get”, “have”) which appear to contribute relatively little semanticcontent to a sentence The fundamental attribute of a stop word is thatit does not directly contribute to the theme of the document, and thedocument is extremely unlikely to be about the stop word; therefore itshould not be counted. The stop words are preferably maintained in alist stored in memory. In this manner, the processor reads every word,but only counts those words that do not appear on the stop word list. Inthe above sample paragraph, the first sentence contains the stop words“The,” “is,” “a,” “great,” “to,” “for,” and “a.”

During the pass through the document, the document summarizer checks formorphological variants of the content words and converts them to theirroot form (step 42). For example, the words “walking,” “walked,” and“walks” are all morphological variants of the root form “walk.” In thisway, the root form and associated variants are all counted as the sameword. In the above example paragraph, the words “computer” and“computers” are counted as the same word, as are the words“manufacturer” and “manufacturers.”

The summarizer also analyzes the words for possible phrase compression(step 44). Sets of content words that appear repeatedly in the sameorder are counted as if they are a single content word. For example, theword pair, “Microsoft Corporation,” if occurring a sufficient number oftimes in that exact order, might be counted as a single word. The wordsin such phrases, if taken separately, do not by themselves add anymeaning to the sentence. Without phrase compression, the words“Microsoft” and “Corporation” would each be counted independently, aresult which might undesirably skew the importance of the sentences thatcontain them. In the above example paragraph, the phrase “web site”occurs the same way on two occasions and might therefore be a candidatefor phrase compression. Also assume that the phrase “money backguarantee” is compressed into one word phrase that is counted singly.

When all of the content words in the document are counted, the documentsummarizer produces a table which correlates the content words withtheir corresponding frequency counts (step 46). The content words can beordered with the most frequently occurring words appearing at the top ofthe table. Table 1 shows a ranking of content words from the aboveexample document: TABLE 1 Rank of Content Words Content Word FrequencyCount Computer 3 Manufacturer 3 Internet 2 web site 2 Place 1 Shop 1money back guarantee 1 Visit 1

At step 48, the document summarizer derives a sentence score forindividual sentences within the document according to their respectivecontent words. Sentences with more content words that appear morefrequently in the document are ranked higher than both sentences withfewer high-frequency content words and sentences with content words thatappear less frequently in the document. More specifically, the documentsummarizer ranks the sentences is according to their average word score.This value is derived by summing the frequency counts for all contentwords that appear in the sentence and dividing that tally by the numberof the content words in the sentence. The sentence score is representedas follows:Sentence Score=Sum of Word Frequency Counts÷Number of Words

The sentences are then ranked in order of their sentence scores (step 50in FIG. 2). Higher ranking sentences have comparatively higher sentencescores and lower ranking sentences have comparatively lower sentencescores. Using the word counts in Table 1, the score for the firstsentence in the example paragraph is 1.75, as follows:Sentence #1=[Internet(2)+Place(1)+Shop(1)+Computer(3)]÷4 Words=1.75

Scores for the remaining three sentences are also computed. Table 2shows the ranking for the four sentences in the example paragraph. TABLE2 Rank of Sentences Sentences Score #2 Manufacturers have web sitesdescribing their . . . 2.67 #3 One computer manufacturer offers a moneyback . . . 2.33 #4 That is why that manufacturer has so many visits to .. . 2.00 #1 The Internet is a great place to shop for a computer. 1.75

It is noted that other techniques could be used to derive a sentencescore. For example, the score might be calculated by dividing the totalfrequency count by the total number of all words (including stop words)in sentence. An alternative approach is to simply sum the content wordcounts, without any averaging. Additionally, arithmetic and statisticaltricks can be used, such as basing the sentence score on a median scoreof a content word.

Steps 40-50 constitute the statistical phase of the summarizing method.Concurrent with the statistical phase, the document summarizer performsduring the same pass through the document a cue-phrase analysis toexploit any explicit discourse markers present in the text. In generalthe cue-phrase analysis seeks to identify phrases that might potentiallyrender a sentence confusing or difficult to understand if included inthe summary. In this implementation, the document summarizer comparesthe sentence string to a pre-compiled list of words and phrases (step52).

Upon identification of words or phrases that appear on the list, thedocument summarizer designates the entire sentence as either“prohibited” or “conditioned.” If a sentence is “prohibited,” thedocument summarizer takes action to prevent the sentence from beingincluded in the summary, regardless of its sentence score (steps 54 and56). If a sentence is deemed “conditioned,” the document summarizer willonly include the sentence in the summary if the condition is met (steps58 and 60). One example of a conditioned sentence is one that depends onthe previous sentence or surrounding context to understand its meaning.A sentence that begins “He said . . . ” is only clear if the readerknows who “He” is. Accordingly, this sentence depends on a previouscontext and will be used in the summary only if the previous sentenceidentifying “He” is also used in the summary.

Table 3 shows example words and phrases from the pre-compiled cue-phraselist that render a sentence as “prohibited” or “conditioned.” TABLE 3Cue-Phrase List Conditional Words or Phrases Sentence-initial PersonalPronouns: He, She, It, They, Their Sentence-initial DemonstrativePronouns: These, That, This, Those Sentence-initial Quantifiers: All,Most, Many Both, Which Conjunction (i.e., And, Nor, But, Or, Yet, So,For) Specific Reference (i.e., Such, That is) Extension (i.e., Relatedto this) Causation (i.e., Therefore, Thus, And so) Contrast (i.e.,However, Nonetheless, In spite of this) Reinforcement (i.e., Indeed,Accordingly) Supplementation (i.e., At any rate, In reply)

Prohibited Words or Phrases Reference (i.e., In FIG. 1 . . . , as shownin Chart A)

Applying the cue phrase analysis to the sample paragraph reveals thatthe fourth sentence is conditional because it contains the phrase “Thatis why . . . ” This phrase is listed on the cue-phrase list as adepends-on-previous phrase, meaning that the phrase relies on a previoussentence for context. In this case, the preceding third sentenceexplains that one manufacturer offers a money back guarantee which isthe supporting reason why the manufacturer is said, in the fourthsentence, to have many visits to its web sit. Were the fourth sentenceto appear in a summary without the third sentence, a reader would notunderstand why the manufacturer has so many visits to its web site.Accordingly, the document summarizer sets a condition that the fourthsentence is only used in the summary if the third sentence is also used.

In this example, it turns out that even without the cue phrase list, thefourth sentence will only appear if the third sentence is also used forthe simple reason that the third sentence has a higher score than thefourth sentence. This result is the product of a short document with fewsentences. However, in larger documents with more sentences, thecue-phrase list will effectively institute conditions on certainsentence uses. For instance, suppose that the fourth sentence in theabove four-sentence paragraph had a higher sentence score that the thirdsentence. In this case, the fourth sentence is only used if the lowerscoring, preceding third sentence is used.

Following the statistical and cue-phrase analysis phases, the documentsummarizer creates a summary containing the higher ranked sentenceswhich survive the cue-phrase analysis (step 62). The summary may includea conditioned sentence in the event that the relevant condition issatisfied, but will exclude any prohibited sentences. The length of thesummary is an author-controlled parameter. From Table 2, a two-sentencesummary for the above sample paragraph is as follows:

-   -   Manufacturers have web sites describing their computers. One        computer manufacturer offers a money back guarantee.

The two sentences in the summary had the highest ranking. It is notedthat the sentences are organized in the summary according to their orderof appearance in the document, not in order of their rank. In this case,the appearance and rank order are the same, but this does not have to bethe case. For example, assume that the third sentence received a higherrank than the second sentence. In the resultant summary, thelower-ranked second sentence would still precede the higher-ranked thirdsentence because it appears before the third sentence in the document.Ordering a summary based on rank reorganizes the author's sentencesequence and might result in a confusing and less readable summary.

The two sentence summary did not contain any cue-phrase sentences.However, were the summary expanded to three sentences, it would read asfollows:

-   -   Manufacturers have web sites describing their computers. One        computer manufacturer offers a money back guarantee. That is why        that manufacturer has so many visits to its Internet web site.

In this summary, the last sentence (i.e., the original fourth sentence)had the third highest sentence score (see Table 2). This sentence alsohappens to be a conditioned sentence because it contains the phrase“That is why . . . ” which appears on the pre-compiled cue-phrase list.Accordingly, the sentence is used only if the condition is met. In thiscase, the condition is a depends-on-previous condition, which stipulatesthat a sentence belonging to this class can be included in a summaryonly if the preceding sentence is also included. Since the thirdsentence does appear in the summary, the depends-on-previous conditionis met and hence, the fourth sentence can be included in the summary.

After the summary is created, the document summarizer displays thesummary on the computer monitor in one of four, author-selected UI (userinterface) formats (step 64). The first UI format is to insert thesummary at the top of the existing document. The document summarizerlocates the top of the file, and inserts the summary text before theopening paragraph of the document. FIG. 3 a shows an existing document70 with a summary 72 inserted at the top. A second UI format is tocreate or open a new document and insert the summary in the newdocument. FIG. 3 b illustrates a new document 74 opened and overlaid onan existing document 70. The summary 72 is inserted in the new document74.

The third UI format is to underline or otherwise highlight the importantsentences used in the summary. The fourth UI format is to show only thesummary sentences without the accompanying text. These third and fourthformats are similar to the conventional presentations described in theBackground Section.

Once the summary is created and displayed to the author, the author cansave the summary in the existing document or new document to memory(step 66).

A modification of the above computer-implemented method concerns thestatistical phase. In the method described above, the content words arecounted and all of the sentence scores are derived using the samefrequency counts. In some instances, there may be occasions wherecertain words in the higher ranking sentences unduly dominate andinfluence the scores of the sentences.

The modified technique is an iterative scoring approach. Under thistechnique, the summarizer initially scores all of the sentences as aboveon the first iteration. Then, for the next iteration, the summarizerremoves the influence of the highest ranking sentence and re-scores theremaining sentences as if the highest ranking sentence was not present.For the next iteration, the influence of the highest scoring sentencefound in the previous iteration is removed, and the remaining sentencesare again re-scored as if the two highest ranking sentences were notpresent. This process continues for all of the sentences.

To demonstrate this modified statistical analysis, let's apply theanalysis to the four-sentence paragraph used above. The first step is tocount the content words, while accounting for the stop words and phrasecompression. The word count yields Table 1. Next, the sentence scoresare derived. The first iteration yields the same score of 2.67 forsentence #2. Here, however, is where the modified method begins todiverge. To remove the influence of the highest ranking sentence, thedocument summarizer re-computes the sentence scores as if the secondsentence were never present in the document. The frequency counts of thecontent words are reduced accordingly. Table 4 is a modified version ofTable 1 and reflects the absence of the second sentence. TABLE 4 Rank ofContent Words With Second Sentence Omitted Content Word Frequency CountComputer 3 − 1 = 2 Manufacturer 3 − 1 = 2 Internet 2 web site 2 − 1 = 1Place 1 Shop 1 Money 1 Visit 1

Next, the remaining three sentences are re-scored using the modifiedfrequency counts for the content words. This results in a ranking of1.67 for the sentence three, which is second highest.Sentence #3=[computer(2)+manufacturer(2)+money(1)]÷3 Words=1.67

The influence of sentence #3 is then removed, and the frequency countsof the content words are reduced accordingly. Table 5 is a modifiedversion of Table 4 and accounts for the absence of the second and thirdsentences. TABLE 5 Rank of Content Words With Second and Third SentencesOmitted Content Word Frequency Count Computer 3 − 2 = 1 Manufacturer 3 −2 = 1 Internet 2 web site 2 − 1 = 1 Place 1 Shop 1 Money 1 − 1 = 0 Visit1

Continuing this process through the remaining two sentences yields a newsentence rank, given in Table 6. TABLE 6 Rank of Sentences WithIterative Re-Scoring Method Sentences Score #2 Manufacturers have websites describing their . . . 2.67 #3 One computer manufacturer offers amoney back . . . 1.67 #1 The Internet is a great place to shop for acomputer. 1.33 #4 That is why that manufacturer has so many visits to .. . 1.00

Notice that using the iterative re-scoring method yields a slightlydifferent sentence ranking with sentence #1 being ranked higher thansentence number #4. A two-sentence summary using the iterativere-scoring method is identical to the two-sentence summary created usingthe method described above. However, a three-sentence summary isconsiderably different. A three-sentence summary using Table 6 is asfollows:

-   -   The Internet is a great place to shop for a computer.        Manufacturers have web sites describing their computers. One        computer manufacturer offers a money back guarantee.

This three-sentence summary is a good example of the situation where thesentences used in the summary are written in order of the appearance inthe document, and not in order of their rank. The beginning sentence inthe summary is actually the third highest ranked sentence. Nonetheless,it is written in the summary as the first sentence because it appears inthe document before the higher-ranked sentences #2 and #3.

In the above example, the counts of the content words appearing in thehigher ranking sentences are all reduced by a full count. In otherimplementations, the frequency counts can be changed by varying degreesdepending upon the degree of influence introduced by the higher rankingsentences the manufacturer or author desires to remove. For instance,the summarizer might compensate by subtracting a fractional amount (say,0.3 or 0.5) from each count corresponding to words that appear in thehighest ranking sentence. Alternatively, the compensation amount mightvary depending upon whether the content word has a high or low frequencycount compared to other content words. The amount that word counts arecompensated during this dynamic scoring process can be determined andset by the manufacturer or author according to various statistical ormathematical approaches which appropriately negate the influence of thecontent words appearing in the higher ranking sentences.

The document summarizer is advantageous over prior art summarizersbecause it is designed from the author's standpoint. It enables authorsto automatically create summaries of their writings using a combinedstatistical and cue-phrase approach. Once created, the summarizerpresents a UI that enables the author to place the summary at the top ofthe document or in a new document. This placement is convenient anduseful to the author. The author is then free to revise the summary ashe/she wishes.

Another advantage of the document summarizer stems from the combinedstatistical and cue phrase processing. This dual analysis is beneficialbecause the statistical component ensures that a summary will always beproduced, and the cue phrase component improves the quality of theresulting summary.

In compliance with the statute, the invention has been described inlanguage more or less specific as to structure and method features. Itis to be understood, however, that the invention is not limited to thespecific features described, since the means herein disclosed compriseexemplary forms of putting the invention into effect. The invention is,therefore, claimed in any of its forms or modifications within theproper scope of the appended claims appropriately interpreted inaccordance with the doctrine of equivalents and other applicablejudicial doctrines.

1. In a document summarizing application executing on a processingdevice, a computer-implemented method comprising: accessing a document;constructing, using a phase summarizing process, a sentence-basedsummary of writings of the document from the writings; locating a fileassociated with the document; inserting the sentence-based summary intothe file such that the sentence-based summary is before an openingparagraph of the document; and saving the file to non-volatile memory.2. A word processing application stored in a storage medium whichdirects a computer to perform the computer-implemented method as recitedin claim
 1. 3. An electronic mail application stored in a storage mediumwhich directs a computer to perform in the computer-implemented methodas recited in claim
 1. 4. An Internet web browser application stored ina storage medium which directs a computer to perform in thecomputer-implemented method as recited in claim
 1. 5. Acomputer-readable medium comprising computer program instructions that,when executed by a processor, are for performing operations on acomputing device comprising: accessing a document; constructing atextual content-based summary of a document's writings of the documentfrom the writings; and inserting the textual content-based summary intothe document such that the textual content-basis summary is before anopening paragraph of at a beginning of the document and on a common pagewith starting content of the document; and saving the document.
 6. Thecomputer-readable medium of claim 5, wherein the computer programinstructions are implemented by a word processing application.
 7. Thecomputer-readable medium of claim 5, wherein the computer programinstructions are implemented by an electronic mail application.
 8. Thecomputer-readable medium of claim 5, wherein the computer programinstructions are implemented by an Internet web browser application. 9.The computer-readable medium of claim 5, wherein the computer programinstructions for saving the document save the document in non-volatilememory.
 10. The computer-readable medium of claim 5, wherein thecomputer program instructions for saving the document save the documentinto a database.
 11. A computing device comprising: a processor; andmemory coupled to the processor, the memory comprising computer programinstructions that, when executed by a processor, are for performingoperations on a computing device comprising: accessing a document;constructing a textual content-based summary of a documents writings ofthe document from the writings; and inserting the textual content-basedsummary into the document such that the textual content-basis summary isbefore an opening paragraph of at a beginning of the document and on acommon page with starting content of the document; and saving thedocument.
 12. The computing device of claim 11, wherein the computerprogram instructions are implemented by a word processing application.13. The computing device of claim 11, wherein the computer programinstructions are implemented by an electronic mail application.
 14. Thecomputing device of claim 11, wherein the computer program instructionsare implemented by an Internet web browser application.
 15. Thecomputing device of claim 11, wherein the computer program instructionsfor saving the document save the document in non-volatile memory. 16.The computing device of claim 11, wherein the computer programinstructions for saving the document save the document into a database.